Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations

نویسندگان

  • Raphael Hoffmann
  • Congle Zhang
  • Xiao Ling
  • Luke S. Zettlemoyer
  • Daniel S. Weld
چکیده

Information extraction (IE) holds the promise of generating a large-scale knowledge base from the Web’s natural language text. Knowledge-based weak supervision, using structured data to heuristically label a training corpus, works towards this goal by enabling the automated learning of a potentially unbounded number of relation extractors. Recently, researchers have developed multiinstance learning algorithms to combat the noisy training data that can come from heuristic labeling, but their models assume relations are disjoint — for example they cannot extract the pair Founded(Jobs, Apple) and CEO-of(Jobs, Apple). This paper presents a novel approach for multi-instance learning with overlapping relations that combines a sentence-level extraction model with a simple, corpus-level component for aggregating the individual facts. We apply our model to learn extractors for NY Times text using weak supervision from Freebase. Experiments show that the approach runs quickly and yields surprising gains in accuracy, at both the aggregate and sentence level.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Effectively Creating Weakly Labeled Training Examples via Approximate Domain Knowledge

One of the challenges to information extraction is the requirement of human annotated examples, commonly called gold-standard examples. Many successful approaches alleviate this problem by employing some form of distant supervision, i.e., look into knowledge bases such as Freebase as a source of supervision to create more examples. While this is perfectly reasonable, most distant supervision me...

متن کامل

Using Commonsense Knowledge to Automatically Create (Noisy) Training Examples from Text

One of the challenges to information extraction is the requirement of human annotated examples. Current successful approaches alleviate this problem by employing some form of distant supervision i.e., look into knowledge bases such as Freebase as a source of supervision to create more examples. While this is perfectly reasonable, most distant supervision methods rely on a hand coded background ...

متن کامل

Distant Supervision for Relation Extraction Using Tree Kernels

In this paper we define a simple Relation Extraction system based on SVMs using tree kernels and employing a weakly supervised approach, known as Distant Supervision (DS). Our method uses the simple one-versus-all strategy to handle overlapping relations, i.e., defined on the same pair of entities. The DS data is defined over the New York Times corpus by means of Freebase as an external knowled...

متن کامل

A Comparison of Weak Supervision methods for Knowledge Base Construction

We present a comparison of weak and distant supervision methods for producing proxy examples for supervised relation extraction. We find that knowledge-based weak supervision tends to outperform popular distance supervision techniques, providing a higher yield of positive examples and more accurate models.

متن کامل

Ontological Smoothing for Relation Extraction

There is increasing interest in relation extraction, methods that convert natural language text into structured knowledge. The most successful techniques use supervised machine learning to generate extractors from sentences which have been labeled with the arguments of the relations of interest. Unfortunately, these methods require hundreds or thousands of training examples, which are expensive...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011